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Abstract 



Partitioning a graph into groups of vertices such that those within 
each group are more densely connected than vertices assigned to different 
groups, known as graph clustering, is often used to gain insight into the or- 
ganisation of large scale networks and for visualisation purposes. Whereas 
a large number of dedicated techniques have been recently proposed for 
static graphs, the design of on-line graph clustering methods tailored for 
^ , evolving networks is a challenging problem, and much less documented in 

the literature. Motivated by the broad variety of applications concerned, 
ranging from the study of biological networks to the analysis of networks of 
CO . scientific references through the exploration of communications networks 

such as the World Wide Web, it is the main purpose of this paper to 
introduce a novel, computationally efficient, approach to graph clustering 

■ in the evolutionary context. Namely, the method promoted in this article 
CO ' can be viewed as an incremental eigenvalue solution for the spectral clus- 
tering method described by Ng. et al. (2001). The incremental eigenvalue 
solution is a general technique for finding the approximate eigenvectors of 
a symmetric matrix given a change. As well as outlining the approach in 

■ detail, we present a theoretical bound on the quality of the approximate 

eigenvectors using perturbation theory. We then derive a novel spectral 
clustering algorithm called Incremental Approximate Spectral Clustering 
(IASC). The IASC algorithm is simple to implement and its efficacy is 
demonstrated on both synthetic and real datasets modelling the evolution 
of a HIV epidemic, a citation network and the purchase history graph of 
an e-commerce website. 

1 Introduction 

Graph-mining has recently received increasing attention in the machine- learning 
literature, motivated by application domains such as the Internet, social net- 
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works, epidemics of transmissible infectious diseases, sensor and biological net- 
works. A crucial task in exploratory analysis and data visualization is graph 
clustering [311 110] , which aims to partition the vertices in a graph into groups 
or clusters, with dense internal connections and few connections between each 
other. There is a large body of work on graph clustering. A possible approach is 
to consider a certain measure that quantifies community structure and formulate 
the clustering issue as an optimisation problem (which is generally NP-hard), 
for which fairly good solutions can be obtained recursively or by using adequate 
metaheuristics, see [221 HU El ISO] f° r example. A variety of approaches to graph 
clustering exist, such as those based on modularity maximisation for instance, 
see [23] [22]. In this paper, focus is on the spectral clustering approach [36], 
which performs well empirically, is often simple to implement and benefits com- 
putationally from the availability of fast linear algebra libraries. The general 
principle of spectral clustering is to compute the smallest eigenvectors of some 
particular matrix L (refer to Section [5] for further details) and then cluster the 
vertices based on their representation in the corresponding eigen-space. The 
popularity of this approach arises from the fact that the obtained clustering of 
vertices is closely connected to the spectral relaxation of the minimization of 
the normalised cut criterion, see |33| . 

In many applications such as communications networks (e.g. the Web and 
Internet), biological networks (of proteins, metabolic reactions, etc.), social net- 
works or networks of scientific citations for instance, the graphs of interest slowly 
change with time. A naive approach to this incremental problem is to cluster 
each graph in the sequence separately, however this is computationally expensive 
for spectral clustering as the cost of solving the eigenvalue problem is 0(n 3 ) at 
each iteration, where n is the number of vertices. There has been some previous 
work on the incremental spectral clustering problem, for example [35] [26j [25j [15] 
however only [3H1 [23 [H] update the eigen-system. In this paper we propose an 
efficient method for clustering a sequence of graphs which leverages the eigen- 
decomposition and the clustering on the previous graph to find the new clus- 
tering. Firstly, a fast approximation of a rank-fc eigen-decomposition of L t+ i 
knowing that of L t is derived from the Singular Value Decomposition (SVD) 
updating approach used for Latent Semantic Indexing in [JO]. Here, the update 
is efficient to compute when the change (defined in the subsequent analysis) 
between L t and L t+ i is small. Secondly the clustering of vertices is updated 
accordingly to the new eigen-space. The efficiency of the complete approach, in 
terms of clustering accuracy, is demonstrated on synthetic and real data. We 
point out that the clustering approach was first outlined in [5] . Here we provide 
a theoretical analysis on the quality of the eigen-approximation, as well as a 
more extensive empirical study of the algorithm. 

The paper is organised as follows. Standard spectral clustering and SVD 
updating approaches are recalled in Sections [2] and [3] Then Section [4] details 
the proposed eigen-decomposition update and Section [3] studies the accuracy of 
the resulting approximate eigenvectors using perturbation theory. In Section 
[5] we show how the eigen-decomposition updates can be applied to spectral 
clustering. Numerical results are gathered in Section and the paper ends 
with Section [8] discussing results and planned future work. 

Notation: A bold uppercase letter represents a matrix, e.g. X, and a column 
vector is displayed using a bold lowercase letter e.g. x. The transpose of a matrix 
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or vector is written X T . The concatenation of the columns of two matrices A 
and B is written [A B]. A[J, J] represent the submatrix of A formed using the 
row and columns indexed by /, and A[7, :] and A[:,7] are submatrices formed 
using the rows and columns / respectively. The matrix A*, is that formed using 
the largest k eigenvectors and eigenvalues of A, however I p is the p x p identity 
matrix. 

2 Graph Clustering 

Consider an undirected graph G — (V,E), composed of a set of vertices V — 
{vi, ...,v n } and edges E C V x V such that for every edge (Vi, vj) there is also 
an edge (vj, Vi). One way of representing the edges of G is using an adjacency 
matrix A G {0,1}™ X ™ which has Ay = 1 if there is an edge from the i-th 
to the j-th vertex and Ajj = otherwise. More generally, the weight matrix 
W G R nxn allows one to assign nonzero numerical values on edges, and thus 
Wy ^ when there is an edge from the i-th to the j-th vertex. 

In the perspective of spectral clustering, a useful way of representing G is 
through its Laplacian matrix [3] . There are several definitions of the Laplacian 
matrix, however we are interested in the normalised Laplacian matrix, which is 
symmetric and positive semi-definite. We recall its definition below for clarity. 

Definition 2.1. The unnormalised Laplacian matrix of a graph G is defined as 
L = D — W where D is the degree matrix with supposedly nonzero diagonal en- 
tries Da = deg(vi), denoting by deg{vi) — Y] , Wij the degree of the i-th vertex, 
and zeros elsewhere, and W is the weight matrix. The normalised Laplacian 
matrix of a graph G is then defined as 

L = D ?LD K 

The normalised Laplacian matrix is used in the spectral clustering approach 
of Ng et al. [24]. The algorithm computes the Laplacian and then finds the 
k smallest eigenvectors which are used for clustering in conjunction with the 
k- means algorithm, see Algorithm [1] The Laplacian matrix is often sparse and 
one can use power or Krylov subspace methods such as the Lanczos method to 
find the eigenvectors. There are several variants of Algorithm [1] such as that 
of |33] which uses the so-called random walk Laplacian and clusters the small- 
est eigenvectors in a similar way. One of the motivations for these clustering 
methods is from the spectral relaxation of the minimisation of the normalised 
cut criterion. 



Algorithm 1 Spectral Clustering using the Normalised Laplacian [21] 
Require: Graph G with weight matrix W G M nxn , number of clusters k 

l: Find k smallest eigenvectors Vfc = [vi, . . . ,Vfe] of normalised Laplacian L 

2: Normalise the rows of i.e. <— diag(VfcV^) - 2 

3: Cluster rows of Vfc with the fc-means algorithm 

4: return Cluster membership vector c G {!,..., k} n 



A naive approach to spectral clustering on a sequence of graphs has a large 
update cost due to the computation of the eigen-decomposition of L t at each 
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iteration t. A more efficient approach is to update relevant eigenvectors from 
one iteration to the next. In [35] the authors use the spectral clustering of 
[2"4") however they do not update the eigenvectors incrementally but instead the 
clustering directly. The iterative clustering approach of Ning et al. [2S1 [21] uses 
the spectral clustering method given in [33] and updates eigenvectors incre- 
mentally. The algorithm incrementally updates the solution of the generalised 
eigenproblem Lv = ADv by finding the derivatives on the eigenvalues/vectors 
with perturbations in all of the quantities involved. An iterative refinement al- 
gorithm is given for the eigenvalues and eigenvectors given a change in the edges 
or vertices of a graph. One then clusters the resulting k smallest eigenvectors 
using fc-means clustering. In order to limit errors which can build up cumu- 
latively the authors recompute the eigenvectors after every 77-th graph in the 
sequence. A modification of the algorithm is proposed in [15] which improves 
efficiency by modelling clusters using a set of representative points (a similar 
strategy is used in [35] for example to cluster points in R d ). 

A disadvantage of the approach of [26] [25] lies in the fact that, to update 
an eigenvector, one must invert a small matrix for each weight change in the 
graph which makes updates costly. The size of this matrix is proportional to the 
number of neighbours of the vertices incident to the changed edge. In contrast, 
the approach presented in our paper has a smaller update cost for a set of vertex 
or edge weight changes between L t and L t+ i, since changes are considered in a 
batch fashion, as will later become clear. A further problem with the Ning et 
al. approach is that eigenvectors are updated independently of one another and 
hence one loses the orthogonality vfDvj = S(i,j), where 6 is the Kronecker 
delta function (taking the value 1 if i = j and otherwise), and vectors can 
become correlated for example. 

Another way of improving the efficiency of spectral clustering is to com- 
pute approximate eigen-decompositions at each iteration, for example by using 
the Nystrom approach |38j . The Nystrom method is used for spectral graph 
clustering in [TT] in conjunction with image segmentation. To estimate the 
eigenvalues and eigenvectors of A 6 M. nxn , one first finds a matrix A [7, 7] in 
which 7 e {1, . . .n} m is a set of indices selected uniformly at random. If we 
assume that A [7, 7] is positive definite, we can take its square root A[7, 7] 1//2 . 
One then defines 

S = A[7, 1]+A[I,I]- 1 ' 2 A[lJ]A[I,i] T A[I,I]- x ' 2 , 

where 7 is the complement of 7, and diagonalises S using the eigen-decomposition 
S = UAU T . Let V = A[:, 7]A[7, 7] -1 / 2 UA -1 / 2 , then it can be shown that the 
Nystrom approximation of A, given by Aj = A[:, 7] A [7, 7]~ 1 A[7, :], is equiv- 
alent to VAV T . If A is indefinite a more complicated two-step procedure is 
required, see [TTJ for details. The resulting approximation is applied to find 
the first few eigenvectors of the normalised Laplacian matrix at a total cost of 
0(nm 2 + m 3 ). Several efficiency improvements based on this approach are pro- 
posed in [Jj5] which finds the largest k approximate eigenvectors of D~ 5 WD 5 
at a reduced time and space complexity. 

The quality of the resulting approximation is determined by the extent the 
submatrix A[7, 7] is spanned by A[7, :]. One would naturally expect a good ap- 
proximation for example when A [7, :] spans the space of A. The choice of sam- 
pling of 7 also effects approximation quality, and empirical and theoretical work 
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in [TB] suggests random sampling without replacement over the non-uniform 
sampling in [7J [5] . Hence for our later empirical work, we will use uniform ran- 
dom sampling without replacement. In addition to |16j . error bounds for the 
Nystrom method are presented in [7J [2D] in terms of the matrix approximation 

error using the Frobenius or spectral norm (given by ||A|| F = ^/tr(A T A) and 

|| A || 2 = J A ma2; (A T A) respectively). One of the disadvantages of applying the 
Nystrom based approaches of [111119] on graphs is that by sampling only a subset 
of columns of the normalised Laplacian one can exclude important edges which 
help define clusters. These approaches are naturally more effective when a set 
of points {xi, . . . , x„} 6 K d is mapped into a weighted graph using for example 
the Gaussian weighted distance Wjj = exp(— ||xj — Xj || 2 )/2cr 2 , for a 6 R + , 
which is very informative about the relative positions of the points. 

3 SVD Updating 

An important aspect of the incremental clustering method described later lies 
in its ability to efficiently compute the eigenvectors of a matrix from the eigen- 
vectors of a submatrix, and for clarity's sake, we outline the SVD-updating 
algorithm of [3UJ. The SVD of A e R" 1 *™ i s the decomposition 

A = PSQ T , 

where P — [p 1; . . . , p m ], Q — [q 1; . . . , qj and £ = diag(oi, . . . , ay) are respec- 
tively the orthogonal matrices of left and right singular vectors and a diagonal 
matrix of singular values <j\ > a% >,...,> ay, with r — min(m, n). 

In [3D], the authors use the SVD of A to approximate the SVD of [A B], 
with B £ R mxp , without recomputing the SVD of the new matrix. It is known 
that the best fc-rank approximation of A, using the Frobenius norm error, is 
given by its SVD, namely 

A fc = P fc S fe Q^, 

where P&, Q fc , correspond to the fc largest singular values. The general idea 
of the algorithm is to write from A& 's SVD as 

[A fc B] = SAR T , 

in which S and R are matrices with orthonormal columns. One then takes the 
rank-fc SVD A = GkTki^k an d (SG^r^, (RH^) T is the rank-fc approximation 
for [A B]. The dimensionality of A is generally much smaller than that of 
[Afc B] and hence the corresponding SVD approximation is computationally 
inexpensive. 

This approach is more accurate that those presented in [3] [37] , however it 
comes at additional computational cost. Furthermore, the authors prove that 
when the matrix [A B] T [A B] has the form X + a 2 I in which X is symmetric 
positive semi-definite with rank-Zc then the rank-fc approximation of [A B] is 
identical to that of [Afc B] . 
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4 Incremental Eigen-approximation 



In this section we address three types of updating problem upon the largest 
fc eigenvectors of a symmetric matrix. The updates are general operations, 
however they will be explained in the context of spectral clustering later in 
Section [5] Assuming that Yi, Y2 G W ixp and one does not have direct access 
to A G K mxn and B G R mxp but only to the matrices C = A T A, A T B and 
B T B, the updates are: 

1. Addition of a low-rank symmetric matrix C — > C + U where U = YiYj + 
Y 2 Yf 

2. Addition of rows and columns A T A — > [A B] T [A B] 

3. Removal of rows and columns [A B] T [A B] — > A T A 

We have written the updates above in terms of a symmetric matrix A T A 
to improve notation. Note that any positive semi-definite symmetric matrix M 
can be decomposed into the form M = A T A, where A has real entries, for 
example by using an eigen-decomposition or incomplete Cholesky factorisation. 

4.1 Addition of a Low-rank Symmetric Matrix 

The first type of eigen-approximation we are interested in is the addition of a 
low-rank symmetric matrix. One computes the eigen-decomposition of C and 
then approximates the rank-fc decomposition of + U where U = YiY^ + 
Y2Y1 and Cfc is the approximation of C using the fc largest eigenvectors (also 
known as the best fc-rank approximation of C). A similar but not applicable 
update is considered for the SVD case in |40] in which the rank-fc approximation 

of A fc +YiY^ is found from A k in which Yi G K" 1 *-? and Y 2 G R n * j for some j. 
The general idea in our case is to find a matrix with orthonormal columns Q such 

_ t 

that Cfc + U = QAQ for a square matrix A. To this purpose, we first project 
the columns of Yi into the space orthogonal to the k largest eigenvectors Q fc of 
C (note the deviation from standard notation), a process known as deflation. 
Assuming that eigenvectors have unit norm, the matrix Yi is thus deflated as 
follows: 

Y x = (I - Q fc Qfc)Yi, 

at a cost of 0(npk). Note that Yi@i for some @i, is orthogonal to Q fc since 
Q^Yi©! = (Qfc Yx - q£Yi)©i = 0. If we take the SVD Yi = PiSiQ^ 
then Pi is orthogonal to Q fc since Pi = YiQ 1 S 1 T assuming Si has nonzero 
diagonal entries. 

At the next stage we would like to orthogonalise the columns of Y 2 with 
respect to both Q fe and Pi. Hence we deflate Y 2 in the following way: 

Y 2 = (I-P 1 Pf-Q fc Q^)Y 2 , 

at cost 0{npk), where we have used the fact that Pi is orthogonal to Q fe . 
Proved in a similar way to the step used earlier, the matrix in the column space 
of Y 2 , Y 2 @ 2 for some © 2 , is orthogonal to Q fe and Pi. Hence, we compute the 
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SVD Y 2 = P2S2Q2 an d n °te that the matrices Pi, P2 and Q fe are mutually 
orthogonal and span the space spanned by Cfc + U. This allows one to write 



U = QAQ T 



as required with Q 
equivalcntly 



= [Q fe Pi P 3 ] € R" x ( fc+2 P) and A = Q T (C fe + U)Q, or 



S 2 Q 2 Yj Q k 



Qlup! 
pfup! 
S2Q2 Qi^i 



Qfc YiQ 2 S 2 

_ - T — - 

SiQ! Q2S2 




in which A S R( fe + 2 P) x ( fc + 2 W. We take the rank-fc eigen-decomposition A^ = 
HfellfeH^ and then the final eigen- approximation is given by (QHfe)IIfe(QHfc) T 
in which it is easy to verify that the columns of QHfe are orthonormal. 

The deflation and SVD of Yi and Y 2 cost 0(npk) and (D(np 2 ) respectively 
and the eigen-decomposition of A is 0((k + 2p) 3 ). In order to compute A 



one can reuse the computations of Q fc Yi, Q fc Y2 



- T 
PlY 2 



which are used for 



deflations and also the matrices used for the SVD decompositions. Thus A 
is found in 0(p 3 + p 2 k +pk 2 ), and the overall complexity of this algorithm is 
0((k 2 +p 2 )(p + k) + np{p + k)). Of note here is that n scales the complexity in 
a linear fashion, however costs are cubically related to k and p. 



4.2 Addition of Rows and Columns 

In correspondence with the SVD-updating method given above we consider the 
case in which one has the eigen-decomposition of C and then wants to find the 
rank-fc approximation of E = [A B] T [A B]. Such a process is useful not 
just in incremental clustering but also in incrementally solving kernel Principal 
Components Analysis (KPCA, [35]) for example. This update can be written 
in terms of that described above. This is seen by writing the former update 
A T A -> [A B] T [A B] in terms of the latter: 



" A T A " 




' A T A " 




A T B 





— >• 





+ 


B T A B T B 



The second term on the right hand side can be written as YiY^ + Y2Y^" 
where Yi = [0 l p ] T and Y 2 = [B T A iB T B] T . The eigenvectors of the first 
matrix on the right-hand side are found from those of A T A by simply adding 
p zero rows to the existing eigenvectors, and the corresponding eigenvalues are 
identical. Additional eigenvectors are standard unit vectors spanning the p new 
rows with corresponding eigenvalues as zero. A useful insight is that the deflated 
matrix Yi = Yi and hence its SVD decomposition can be written directly as 
Y x = [0 I P ]%I P . 



4.2.1 Alternative Approach 

Here we outline a simpler and more direct approach for the addition of rows and 
columns to a matrix. First let C = QJ1Q T in which Q is a matrix of eigenvectors 
and O is a diagonal matrix of eigenvalues. Note that E = [A k B] T [Afc B] 

~ T 

can be written as QAQ for a square matrix A. In our case we have 
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Q 



o 



o 

In 



and A 



B T A k Q k 



Qfc A fc B 

B T B 



noting that A k Q k Q h 
note that Q^A T B = 



= Q fc n fc Q£Q fc Q£ 



Afc since Q fe Q fe = I. Furthermore, 



k 

Q k A k 'B and calculating 



Q fc QSP J B 

A^B is Ojnpk). It follows that A e R( fe +p)><( fe +p) can be found using Q fc , 
A T A, and A T B and B T B. In the final step, and analogously to the SVD case 
we take the rank-fc eigen-decomposition A = HfcllfcHfc at a cost of 0((k + p) 3 ) 
and then the rank-fc eigen approximation of E is given by (QHfc)IIfc(QHfc) T . 

A similar eigen-update is considered in |17] and used in conjunction with 
PCA [T3] and KPCA in [T7]. In this method the authors phrase the problem 
as a series of SVD updates, and we use a more direct approach. Furthermore, 
the eigen-approximation above is identical to that at the start of Section 14.21 
with Yi = [0 I p ] T and Y 2 = [B T A fc ±B T B] T . The difference between the 
method above and that of Section 14.21 is the former uses Afc as opposed to A in 
Y2 which results in a greater error. 



4.3 Removing Rows and Columns 

Observe that removing rows and columns is equivalent to zeroing the corre- 
sponding rows/columns: 



A T B " 
B T A B T B J ' 

and in this form one can see the connection to Section [4.2l Again, one can write 
the second term on the right hand side as YiY^ + Y 2 Y^ where Yi = [0 I p ] T 

andY 2 = -[B A iB B] T where B A and B B are found using the rank-A: 
approximation of C. Since we are updating the rank-fc approximation of C, the 
final eigen-approximation will have zero elements in the eigenvectors at rows 
corresponding to those row/columns that are deleted. 



A A A B 
B T A B T B 



A A A B 
B T A B T B 



4.4 Interpretation 

A common theme in the updates outlined above is that one can write them all 
in terms of the addition of a low-rank symmetric matrix U to a positive semi- 
definite matrix C. Our approximation method computes Cfc + U via its cxprcs- 
sion as QHfcllfcHfc Q where Q is a matrix with orthogonal columns spanning 
Cfc + U and Hfc and Ilfc represent the largest k eigenvector and eigenvalues of 
a matrix A = Q (Cfc + U)Q. The following lemma shows the consequence of 
this approach. 

Lemma 4.1. Decompose a matrix Z= FAF T where F is any matrix with or- 
thonormal columns F T F = I. For some k find the best rank-k eigen-approximation 
Afc = JJfcllfciJfc in which H k and Tl k are the largest k eigenvectors and eigenval- 
ues of A, and let Z= FH k Tl k H^ F T . Then the best rank-k eigen- decompostion 
of Z is given by: 

z k =u k s k ul = z. 
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Proof. Note that A = F T ZF due to the orthogonality of F. Let u, s be the 
eigenvectors and eigenvalues of Z, and define u = Fv for some v then F T ZFv = 
sv. This implies that the eigenvalues of A are the same as those of Z and the 
eigenvectors are related by U = FV. Hence we have V = H and S = II which 
implies Z = FH/filfcH^ F T = UfcSfcU^ = Zfc as required. The only condition 
on F is that the eigenvectors of Z are in the column space of F. This must be 
the case however since Z = USU T = FVSV T F T . □ 

Hence, the update Cfe + U of Section 14.11 is identical to the best rank-fc 
approximation of + U. 

5 Eigen-approximation Quality 

This section aims to bound the quality of the proposed eigen-approximation 
approach. As mentioned in Section 14.41 the proposed approach replaces the 
expected best rank-Ac approximation of a matrix A + B by the best rank-Ac 
approximation of Afc + B, where A& denotes the best rank- A; approximation 
of A. Hence, our objective is to control the difference between (A + B)fc and 
(A k +B) k . 

The result in Section 14.41 gives us a first insight into the approximation 
error of the updates described: the residual matrix is that corresponding to the 
eigenvectors and eigenvalues after k. One can see that for any matrix C 

n 

||c-c fc || 2 F = ||cv|| 2 F = <•>?. 

i=k+l 

where u>i is the ith eigenvalue of C (unless otherwise stated eigenvalues are 
always given in descending order) and Ck 1 - is the approximation of C using 
eigenvectors/eigenvalues after k. This implies that C is well approximated by 
the largest k eigenvectors if the sum of the square of the remaining eigenvalues 
is small. 

This certainly gives us insight into when our eigen-updating approach will 
be accurate, however the kind of matrices we will work with do not have this 
property in general. We now turn to matrix perturbation theory [34] in order 
to learn more about the approximated eigenvectors of the updated matrix. In 
a nutshell, it lies in controlling the angle between two invariant subspaces. Be- 
fore giving the corresponding theorem, the following subsection introduces the 
necessary notions of invariant subspaces of a matrix and of the canonical angles 
between subspaces. 

5.1 Invariant Subspaces and Canonical Angles 

We begin by introducing the simple concept of an invariant subspace. 

Definition 5.1. The subspace X is an invariant subspace of A if AX C X. 

It can also be shown that if the columns of X form a basis for X of A 
then there is a unique matrix L such that AX = XL. The matrix L is a 
representation of A with respect to the basis X, and it has identical eigenvalues 
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to A. A useful decomposition in perturbation theory is to reduce A to a block 
diagonal form. Let Xi be an orthogonal matrix which spans the invariant 
subspace of A, X\, and assume that we have a matrix Y2 such that [X1Y2] is 
unitary and Y2 spans the space orthogonal to X\ , then this allows us to write 



[X 1 Y 2 ] T A[X 1 Y 2 ] = 



Li H 
L 2 



(1) 



in which Li = X^AXi, L 2 = Yj AY 2 and H = X[AY 2 . The above equation 
is known as the reduced form of A with respect to [XiY 2 ]. The proof of why the 
bottom left block of this matrix is zero is straightforward, see [31] for details. 
This gives us the knowledge to define a simple invariant subspace. 

Definition 5.2. Let X be an invariant subspace of A and consider the reduced 
form of Equation {!]), then X is a simple invariant subspace of A if there are 
no common eigenvalues between L\ and X 2 . 

Notice that a simple invariant subspace has a complementary space, defined 
as follows. 

Definition 5.3. Let the simple invariant subspace X\ have the reduced form of 
Equation ([T]) with respect to the orthogonal matrix [Xi Y 2 ] . Then there exist X 2 
and Yi such that [XlJl 2 ] _1 = [Yi Y 2 ] T and 

A = X 1 L 1 Yl + X 2 L 2 Yl, (2) 

where Li = Yj AXi, i = 1,2. This form of A is known as the spectral resolu- 
tion of A along X\ and X2 . 

This allows us to introduce a theorem essential to our main result. However, 
first we must define the notion of angle between two subspaces. 

Theorem 5.1. Let X x , Yt e R nxe with Xj X 1 = I and Y[ Y y = I. Lf 21 < n, 

there are unitary matrices Q, Un, Vn such that 




QX l Un=\ andQY l Vu = 





in which T = diagi^ji, . . . , 7^) with < 71 < . . . < 7^ . S = diag(ai, . . . , o~i) with 
0-1 > ■ ■ ■ > <J£ > 0, and 7? + of = 1, i = 1, . . . If 21 > n then Q, U n , V n 
can be chosen so that 

( In-l \ 

QX^x^l I 2e - n andQY 1 V 11 = 
\ j 

in which T — diag("/i, . . . , ^n-t) with < 71 < . . . < J n -£> S = diag(a\, . . . , a n -t>) 
with o"i > • • • > fj n _f > 0, and 7? + af — 1, i — 1, . . . , n — I. 

Geometrically, let X\ and 3^1 be subspaces of dimension i and Q be a unitary 
transformation. Then the matrices QXiUn and QYiVn, with Xi 6 X\, 
X 2 € X 2 , Q S Q , form bases of QXi and Q^i and Oi and 7^ can be regarded 
as sines and cosines of the angles between the bases. We now define a measure 
of similarity between subspaces. 
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Definition 5.4. Let X and y be subspaces of the same dimension, then the 
canonical angles between the subspaces are the diagonal entries of the matrix 
sin -1 £ = diag{sm~ 1 (<ti ),..., sin (a n -i)) where X is the matrix defined in 
Theorem I5.il 

It follows that £ is a measure of how two subspaces differ. 



5.2 Angle Between Exact and Updated rank-fc Approxi- 
mations 

We are now ready to state the main theorem required for our result. 

Theorem 5.2. Let A be a Hermitian matrix with spectral resolution given by 
[X 1 X 2 ] T A[X 1 X 2 ] = diag(L 1 ,L 2 ) where [X 1 X 2 ] is unitary. Let Z eW lxk have 
orthonormal columns, M be any Hermitian matrix of order k and define the 
residual matrix as R = AZ — ZM. Let X(A) represent the set of eigenvalues of 
A and suppose that A(M) C [a, (3} and for some S > 0, X(L 2 ) C R\ [a — S, /3 + S]. 
Then, for any unitary invariant norm, we have: 

\\ S mQ(n(X 1 ),n(Z))\\< ^fi (3) 

where IZ(-) is the column space of a matrix. 

Before we introduce the main result we present a result by Weyl |37j which 
characterises the perturbation in the eigenvalues of a matrix. 

Theorem 5.3 (Weyl, [37]). Define A e R nxn and let A = A + E be its 

perturbation. The eigenvalues of A and E are given by Xi and €i, i = 1, . . . , n, 
respectively. Then the eigenvalues of A are, for i = 1, . . . , n, Aj € [Aj + £n, Xi + 

We can now present our main result which is closely related to the Davis- 
Kahan theorem [3]. 

Theorem 5.4. Consider a positive semi-definite matrix A £ R nXTl with eigen- 
values u>x, . . . ,u>„ and corresponding eigenvectors Q = [q l , . . . , q n ] . Let B G 
R nx ™ be symmetric with eigenvalues ti and ( A + B) be positive semi- definite 
with eigen- decomposition UTU T where ji are eigenvalues, i = l,...,n. Fix 
integer k, let A k + B have decomposition VYVV T with eigenvalues %i, . . . ,Tr n 
and assume 7^ ^ 7fc+i- Then the following bounds hold on the canonical angles 
between the subspaces defined by Uk and ~V k , assuming ix k > 7fc+i, 



tr[V T k A 2 ^Vk) 



\ S mQ(n(U k ),K(V k ))\\ F < v ' - .* -, (4) 

TTfc - 7fe+l 



JX max {V T k Al^V k ) 

\B^Q{n{Uk),n{v k ))h<^ =-= , (5) 

TTfc — 7fe+l 



where ^ k +i = ^k+i + ^k+i 
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Proof. We will start by considering the first bound. It is clear that XJ k is 
a simple invariant subspace for (A + B). The spectral resolution of A + B is 
given by [UfcUj.x] since this matrix is unitary and we have A + B = U^U^ (A + 
B)U fe U^ + U fc iU[i(A + B)U fc iU[i. In the reduced form Li = T k and 
L2 = T k ±. Furthermore, we set M = life and Z = V^. The residual matrix is 
given by R = (A + B)V fc - V fe II fe = A k± V k and 

||R|| F = ^/tr(Vi , A|xV fc ). (6) 

We know that the eigenvalues of M fall within the range [7Tfc,7ri] and those 
of L2 are bounded using Theorem 15.31 in the range 7$ E [iTi + u) n , Wi + uik+t] 
for i = k + 1, . . . ,n. Considering also the perturbation of eigenvalues of A 
we can write 7, < % — min(w fc+ i + 7r fc+ i,w fc +i + ei) = ^k+i + ^k+i given 
TTfc+i < Wfe+i +ei. It follows that S = ir k — jk+i and plugging into Theorem l5.2l 
gives the required result. 

The second bound is proved similarly except in this case we have 

||R|| 2 - VWV^Vfc). 

□ 

Thus we have a bound on the angle between the subspace of the first k 
eigenvectors of A k + B and the corresponding eigenvectors of it perturbation 
A + B without explicitly requiring the eigen-decomposition of A + B. Provided 
the eigenvalues of A|j_ V/s are small and the eigengap ir k — j k +i is large one 
can be sure that the two subspaces have small canonical angles. One can see 
that under small perturbations is close to Q fc and hence V^Q fc j_ is small, 
resulting in tight bounds in the angles. In the case that the matrices involved 
correspond to normalised Laplacians this result corresponds well with similar 
results outlining a perturbation-based motivation of spectral cluster (see e.g. 
|36j ) which state that if the value of 7^ — 7fc+i is large then one might reasonably 
expect a good clustering. The bound becomes loose when this eigengap is small, 
however in this case the clusters are less distinct even when computing the exact 
eigenvectors. 

6 Incremental Cluster Membership 

We now return to the eigenproblem of Algorithm [TJ Lv = Av, in which we are 
interested in the eigenvectors with the smallest eigenvalues. Define the shifted 
Laplacian as 

L = 2I L = I + D"5WD ~i, 

which is positive semi-definite since L is positive semi-definite with largest eigen- 
value 2. Note that by negating a matrix one negates the eigenvalues, leaving the 
eigenvectors the same, and similarly an addition of eri increases the eigenvalues 
by a leaving the eigenvectors intact. Since we are interested in the smallest 
eigenvectors of L they correspond exactly to the maximum eigenvectors of L 
and we can use the eigen-update methods described above. Observe that the 
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shifted Laplacian is a normalised version of the signless Laplacian |13j defined as 
L+ = D+W, and this can be seen from L = D~^(D+W)D~5 = D ^L+D ~K 
Putting the ingredients together allows us to outline an efficient incremental 
method for performing graph clustering called Incremental Approximate Spec- 
tral Clustering (IASC), see Algorithm^ At a high level the algorithm is quite 
simple: in the initialisation steps one computes the shifted Laplacian matrix 
for the first graph Li and then performs fc-means clustering using the largest 
fc eigenvectors of this matrix. For the t-th successive graph, t > 1, we use the 
eigen-update methods above to approximate the largest eigenvectors of L t using 
the approximate eigenvectors computed at the previous iteration in step llOl For 
these updates, it is simple to recover the matrices Yi and Y2 given a change 
in edge weights. Note that we use the first method of Section to compute 
eigenvector upon the addition of row and columns. Furthermore, we store the 
first I eigenvectors of Li, with I > fc, and recompute eigenvectors every T itera- 
tions in order to reduce cumulative errors introduced in the loop at the expense 
of increased computation. When I — n one recovers the exact eigenvectors at 
each iteration and Algorithms [5] and [1] become nearly equivalent. In the case 
that there is a significant eigengap between eigenvalues, one can fix both I and 
k according to Theorem 15.41 



Algorithm 2 Incremental Approximate Spectral Clustering 
Require: Graphs G\, . . . , Gt of sizes n\, . . . , fix, no. clusters k, approximation 
rank £ > fc, eigen-decomposition recomputation step R 
1: Compute the shifted Laplacian for Gi, Li 

2: Find I largest eigenvectors of Li, let Vj^ be the first k cols of vi 1 ^ 

3: Normalise the eigenvector rows, V^VdiagtV^CV^n^vW 

4: Use fc-means on rows of and store indicators Ci S {1, . . . , fc}™ 1 

5: for t = 2 -> T do 

6: Compute shifted Laplacian for Gt, Lt 

7: if i % R == then 

8: Recompute eigenvectors of L t 

9: else 

10: Use rank-£ eigen-approximation of Section @] 

11: end if 

12: Normalise rows of V^, <- diag(V^ (V^) T )^ 

13: Use fc-means on with initial centroids Ct-x, store c t G {1, . . . , fc}™* 
14: end for 

15: return Cluster membership Ci e {!,..., fc}™ 1 , . . . , Cr & {L ■ • ■ , k} nT 



The complexity of Algorithm [2] is dictated by the sparsity of the graphs 
and the extent of the change between successive graphs. We ignore the steps 
before the for-loop since in general they do not impact the overall complexity. 
At iteration t and step [S] one can compute L t from the weight and degree 
matrix at a cost of 0(nt + \E t \). In the following step if there is a change 
between edges incident to vertices S — {vi 1 , . . . , vj e } then the rows and columns 
corresponding to the union of the neighbours of S, n(S), will change in the 
corresponding shifted Laplacian. In this case p — \n(S)\ in Yi, Y2 G IR nxp 
and if the neighbourhood of the vertices with changed edges n(S) is small, this 
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update can be efficiently computed as outlined in Section^] The cost of fc-means 
is 0(k ns) where s is the number of iterations required for convergence. 



7 Computational Results 

In this section we study the clustering quality of four incremental strategies: 
the naive one which computes the exact eigenvectors at each iteration, denoted 
Exact, Ning et al.'s method (Ning) since it is a competing incremental strategy, 
IASC, and Nystrom which uses the Nystrom eigen-decomposition approximation 
of the shifted Laplacian, since it is often used in spectral clustering. First, 
however we observe the quality of the eigenvectors found using the Nystrom 
approximation and our eigen-updating approach. 



7.1 Quality of Approximate Eigenvectors 

We make a comparison between the Nystrom approximation and our eigen- 
updating methods on a synthetic dataset generated in the following way: The 
initial graph contains 4 clusters of size 250 which are generated using an Erdos- 
Renyi [8] process with edge probability p = 0.1. The Erdos-Renyi process 
creates edges independently randomly with a fixed probability p for all pairs of 
vertices. For each successive graph we then add 50 random edges to simulate 
"noise" in the clusters for a total of 100 graphs. We then compute the largest 
k = 4 eigenvectors of the shifted Laplacian using the full eigen-decomposition, 
the Nystrom method and the eigen-updating approach of Section |4l For the 
Nystrom method we use m = {600, 900} randomly selected columns and for 
the eigen-updating approach we update based on the approximate eigenvectors 
and eigenvalues found for the previous graph. In order to measure the quality 
of the approximations we apply a slightly different form of Theorem 15.21 in 
which S — min |A(Li2) — A(M)| and one uses the Frobenius norm, see Theorem 
V.3.4 of [34] for details. In this case L2 is a diagonal matrix of the smallest 
n — k eigenvalues LOk+i, . . .,uj n of the Laplacian, Z is the first k approximate 
eigenvectors and M = Z T AZ in the notations of the theorem. The experiment 
is repeated with results averaged over 20 iterations. 

Observe that the approximations given by the Nystrom method are gen- 



erally worse than that of eigen-approximation, see Figure 1(a) For example, 
with m = 900 the norm of £ is 0.739 compared to 0.253 with the eigen-updating 
approach with 4 eigenvectors for the final graph. We also tested the Nystrom ap- 
proximation in conjunction with the normalised Laplacian and found the canon- 
ical angles were worse still. Notice that the canonical angles become larger over 
time and this is due the increasing difficulty in separating the first eigenvectors 
from the remaining ones. Furthermore, when m = 600 the Nystrom eigenvectors 
separate their angles with the true ones at a faster rate than when m = 900. 



7.1.1 Perturbation Bound 

We now turn to demonstrating the effectiveness of the bound of Theorem 15.41 
on a synthetic dataset containing 150 vertices. The initial graph contains 3 
clusters of size 50 which are generated using an Erdos-Renyi process with edge 
probability p = 0.3, resulting in 1092 edges. For each successive graph we add 10 



14 



random edges and in total there is a sequence of 80 graphs. We then compute 
the bound of Theorem 15.41 to measure the difference between the canonical 
angles of the real and approximated eigenvectors of the shifted Laplacian. We 
compare the results to the bound using the real eigenvalues of the Laplacian, 
i.e. S = TTk — 7/c+i- This process is repeated 50 times with different random 
seeds and the results are averaged. 



- Nystrom m=600 

- - Nystrom m=900 

- - Eigen-update 




— Frobenius app 
Frobenius proci 




Figure 1: The left plot compares the canonical angles of the largest eigenvectors 
found using the approximation methods with the equivalent real eigenvectors. 
On the right we compare the perturbation bounds of Theorem l5.4l By "precise" 
we mean that we use 5 = tt^ — 7fc+i in the theorem. 



Figure 1(b) shows the resulting bounds for this sequence of graphs. The 
bound diverges slowly from the precise bound for many of the initial graphs, for 
example || sin9(7e(U fe ), 1l{V k ))\\ F is bounded by 0.616 ± 0.106 versus 0.352 ± 
0.024 at the 50th graph and the equivalent results for the 2-norm are 0.434 ± 
0.007 and 0.248 ± 0.02. Soon after this point, we see a large divergence in 
the precise and approximate bounds, although the approximate bounds become 
trivial after the 78th graph for the Frobenius norm and the 73rd for the 2-norm 
corresponding to the addition of 770 and 730 edges respectively to the initial 
graph. When we look at the precise bound, one can see that at the last graph 
|| sin6(7?.(U/ £ ),7?.(Vfc))||2 < 0.39 despite nearly doubling the number of edges 
which points to the precision of the eigen-approximation in this case. 



7.2 Clustering on Synthetic Data 

To evaluate the clustering approaches, two separate synthetic datasets are con- 
sidered, both of which are generated by an Erdos-Renyi process. The first 
dataset, called 3clust, is based on a graph of 3 clusters of 60 vertices each. 
This dataset allows us to compare the clustering quality as the clusters become 
more/less distinct and also the addition and removal of edges. In the corre- 
sponding graph, any possible edge between two points from the same cluster 
occurs with probability p c — 0.3 and the probability of edges between vertices 
in different clusters is selected from p g £ {0.1,0.2}. To generate a sequence of 
graphs we first allow only 20 vertices per cluster and then add 5 vertices to each 
cluster at a time until each one is of size 60. We then reverse the process so that 
the 10th graph is the same as the 8th, and the 11th is the same as the 7th etc., 
which allows us to test the clustering methods upon the removal of vertices. 
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(a) p = 0.1 



(b) p = 0.2 



Figure 2: The mean Rand Index error of the learned clustering on 3clust. The 
numbers after IASC denote the number of eigenvectors computed. 



The second dataset, discrepancy, aims to examine the clustering methods 
on a more complex set of hierarchical clusters. Here 180 vertices are split into 
3 clusters of equal size, and inside each cluster there are 3 sub-clusters. The 
initial graph is empty, and at each iteration t € {1, T}, T — 23, an edge 
{vi,Vj} is added with a probability p e + (p — p £ )£t/T, where i is set to 1 if v- t 
and Vj are in the same subgroup, 0.5 if «j and Vj are in the same group, and 
otherwise. In our case we set p e — 0.0005 and p = 0.01. 

For both datasets and the clustering approaches, fc-means is run with k 
corresponding to the number of clusters (3 and 9 for 3clust and discrepancy 
respectively). For IASC we fix the number of eigenvectors I £ {3,6,12,24} 
with the 3clust dataset and t 6 {9, 18, 72} with discrepancy in order to test 
the approximation quality as this parameter varies. With Nystrom we sample 
m = 90 columns of the Laplacian matrix to find the approximate eigenvectors 
on 3clust and select m £ {9, 18, 36, 72} for discrepancy. On discrepancy, 
the approximation methods start with the 3rd graph in the sequence to allow 
the initial graph to contains enough edges. The experiments are repeated 50 
times with different random graphs constructed using the methods described 
above and the results are averaged. Clustering accuracy is measured through 
the Rand Index |28j between the finest true clustering C and the learned one C. 
Rand Index corresponds to the proportion of true answers to the question "Are 
vertices i>j and Vj in the same cluster ?" . More formally, it is given by 



RandIndex(C,C) = 



{vi,Vj€V: Vi^Vj, 8(C(vi),C(vj)) ^ 8(C(vi),C(vj))\ 



\{vi,Vj € V : Vi=£ Vj}\ 



where C(vi) stands for the cluster index of vertex vt after clustering C, S is the 
Kronecker delta function and \£\ denotes the cardinality of any finite set £ . 

The errors on 3clust are shown in Figure [5] The first point to note is 
that the errors decrease as the cluster size increases up until graph 8 and then 
increase again as vertices are removed. This is explained by that fact that as 
the cluster size increases there are more edges within each cluster relative to 
those between clusters, hence they becomes easier to identify. Notice also that 
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Figure 3: The mean Rand Index of the learned clusterings on discrepancy. For 
IASC and Nystrom the numbers in the legend indicates the number of eigenvec- 
tors and columns sampled at each iteration. 

the methods which use the shifted Laplacian have identical results for the first 
graph and the approximation methods perform worse than exact which is what 
one might expect. When p = 0.1 IASC £ — 24 remains close to the exact eigen- 
computation, while Ning becomes sharply worse. The Nystrom approach is only 
slightly worse than exact despite using only 90 columns to approximate the 
eigen-decomposition. Note that we recompute eigenvectors for IASC and Ning 
at graph 8, and upon removal of vertices it is IASC that results in the largest 
errors except when I — 3. Ning works well with vertex removal, remaining close 
to exact. 

When p — 0.2 the clusters are less well defined and the Nystrom approach 
performs badly with an error of 0.43 versus 0.34 with exact at graph 8. If we 
compare the eigengap between the 3rd and 4th eigenvalues then they are 0.11 
and 0.04 for p = 0.1 and p = 0.2 respectively. IASC is more comparable to exact 
in this case, however again we observe that Ning clusters badly when vertices 
are added, though better upon vertex removal. It is worth noting that since £ 
is a fixed value for IASC, the approximation of the largest eigenvectors of the 
shifted Laplacian represents a smaller fraction of the total sum of eigenvalues as 
the cluster size increases. When using 24 eigenvectors for the approximation on 
the 2nd largest graph one requires approximately 14.5% of the dimensionality 
of the Laplacian, yet the rand index is close to that of exact. 

The results on discrepancy (Figure [3]) show that Ning generally does not 
generate accurate clustering until the 12th graph when eigenvectors are recom- 
puted, however it becomes competitive after this point. This is partly due to 
the fact that one solves a different eigensystem to the other methods which is 
not as effective at clustering for the initial graphs. As we increase the number 
of columns used for the Nystrom approach the rand indices improve for the first 
few graphs in which the clusters are generally not well defined. In contrast, IASC 
results are relatively accurate even for these initial graphs, and surprisingly at 
the final graph we see that with just 9 eigenvectors IASC results in the most 
accurate clustering. At this point more eigenvectors seem to make the solution 
worse, with the exact eigen-decomposition being less accurate than IASC with 
72 eigenvectors. One explanation is that an accurate eigen-decomposition fits 
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"noise" in the Laplacian which is excluded with IASC and a low value of I. 
7.3 Real- world Graphs 

We now apply the clustering methodology analysed previously on several real 
datasets. 

7.3.1 Setup 

Here we use the clustering methods on three real datasets. The first one repre- 
sents individuals in Cuba who are detected as HIV positive between the period 
1986 and 2004, see pQ for details of the related database. Edges in the graph in- 
dicate the occurrence of a sexual contact between two individuals as determined 
using contact tracing, whereby contacts of an infected person are identified and 
tested. The full sexual contact graph at the end of 2004 consists of 5389 people 
however it is strongly disconnected and we consider the growth of the largest 
component of size 2387. We find graphs of detected individuals at 1 month in- 
tervals, starting when the graph contains at least 500 vertices. At any particular 
time point, we consider the component containing first person detected in the 
largest component at the end of the recorded epidemic. 

The next dataset is the high energy physics theory citation network from 
the Arxiv publications database [12 j for the period February 1992 to March 
2002. The full dataset contains 27,770 papers with 352,807 edges and a graph is 
generated as follows: if a paper cites another, an edge is made from the former 
to the latter. Not all of the papers present in the dataset have publication dates 
and hence we use only those with dates and label cited papers with the date of 
the oldest citing paper. Taking the largest component, the final resulting graph 
consists of 15,112 vertices and 193,826 edges. To track the evolution of the 
connected component we start with the oldest paper, and consider the graphs 
of connected papers at 1 month intervals, starting with a graph of at least 500 
vertices. 

The final dataset, called Bemol, was first introduced in [33] and corresponds 
to the purchase history of an e-commerce website over a period of almost two 
years. The initial dataset is a bipartite graph between users and products com- 
posed of more than 700,000 users and 1,200,000 products. In the current ex- 
periment we focus on the first 10,000 users, and a graph is constructed between 
users with edge weights corresponding to the number of commonly purchased 
products between two users. Taking a maximum of 500 purchases per iteration 
in the graph sequence we focus on graphs 500 to 600. 

To test the clustering methods we run each on the sequences of evolving 
graphs under a variety of parameters. As the selection of the number of clusters 
is a complex issue and outside the scope of this paper and we manually choose 
this value for each dataset. The experiment is run using k — 25 clusters for HIV, 
k = 50 for Citation and k = 100 for Bemol. For HIV and IASC, £ G {25, 50, 100} 
and R = 10 and for Ning we recompute exact eigenvectors after every 10 iter- 
ations. When using Citation and Bemol, eigenvector recomputations are per- 
formed every 20 iterations and I 6 {50,100,200,500} and i G {100,200,500}. 
With Nystrom the number of columns is chosen from m G {500, 1000, 1500} for 
HIV and m G {1000, 2000, 5000} for the other datasets. Note that in our imple- 
mentation of fc-means clustering, k represents an upper bound on the number 
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of clusters found. To evaluate the learned clusters we use measures of modu- 
larity and k-way normalised cut. Let = y\ Wj ? - and r = J^d,, then the 
modularity is defined as 



A A \ 



didj 



2r 



S(ci,cj), 



where c £ K ra is the cluster indicator vector and 8 is the Kronecker delta func- 
tion. Intuitively modularity is the difference in the sum of edges weights within 
a cluster and the expected edge weights assuming the same weight distribution 
d for each vertex. The fc-way normalised cut is 



A cut between two clusters A and B is the sum of the weights between the 
clusters and the normalised cut is this sum divided by the sum of the weights 
of all edges incident to vertices in cluster A. Hence the fc-way normalised cut is 
the mean normalised cut between each cluster and its complementary vertices. 
To summarise, the greater the modularity, the better, and the lower the fc-way 
normalised cut the better. 

All experimental code is written in Python and we use an Intel Core i7-2600K 
at 3.40GHz with 16GB of RAM to conduct the simulations. The Laplacian 
matrices are stored in compressed sparse row representation and eigenvectors 
are found using Implicitly Restarted Lanczos Method in ARPACK [18 which 
computes only the required eigenvectors and not the full eigen-decomposition. 

7.3.2 Results 

Figure 2] show the resulting modularities and fc-way normalised cuts for all 
datasets however we begin by studying HIV. For both IASC and Ning, since 
eigenvectors are recomputed every 10 iterations, this can manifest itself as sud- 
den changes in the modularities and fc-way normalised cuts. These changes 
are more pronounced with Ning. We see a close correspondence of IASC and 
exact for both the modularity and normalised cut. As we observed with the 
toy datasets, a lower value of I seems to improve results with the final graph 
having a cut of 0.09 with IASC i = 100 versus 0.10 for exact. Notice that IASC 
matches or improves results on exact, while keeping only 5% or fewer of the 
final number of eigenvectors. Ning fares badly in terms of the modularity of the 
resulting clustering with a value of 0.73 versus 0.82 for the exact approach at 
the final graph. However, with the cut measure Ning provides the best cluster- 
ing, albeit with more unstable curve than the other methods. Nystrom does not 
provide a good approximation of the largest eigenvectors when the rank of the 
Laplacian exceeds the number of columns sampled. 

A similar picture emerges with Citation and we see again that IASC is close 
to exact in terms of both measures. Note that it was too costly to compute 
clustering using Ning on this and Bemol. On Citation, when I e {200, 500} we 
obtain a close match to exact in general. The results are impressive when we 
consider the change in the graphs between eigenvector recomputations: the first 
graph is of size 555, and the 19th is 2855, an increase of 2300. With the 60th 
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Figure 4: The performance of the clustering methods on the sequences of chang- 
ing graphs. 
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Figure 5: The cumulative times taken by the eigenvector computations of the 
clustering methods. 



graph there are 10063 vertices and 12135 at the 79th. Looking at the Nystrom 
curves, we again observe poor clustering performance even when m = 5000. 

Finally consider the Bemol graphs in which is difficult to find clear clusters, 
although they become more distinct over time. This is evident when looking 
at the exact curves for example: the modularity increases slightly from 0.27 
to 0.32 whereas the fc-way normalised falls from 0.74 to 0.65 from beginning to 
end. In contrast to the other datasets exact is improved upon by both IASC and 
Nystrom (when m = 5000) respectively. One of the reasons that the Nystrom 
method is effective on this data is because there are many edges and one can 
sample them out without affecting the clustering significantly. Note however 
that as we have seen in the other plots, Nystrom is rather unstable compared 
to the other methods. Furthermore, the eigenvector updates every 20 iterations 
are noticeable in the cluster measures with IASC. 

To conclude the analysis, Figure \5\ shows the timings of the eigenvector 
computations of the clustering methods for Citation and Bemol. Clearly the 
Nystrom approach costs the least in term of computation when m S {1000, 2000} 
however exceeds the time taken for exact when m = 5000. In this case the time 
is dominated by the eigen-decomposition of a matrix in R mxm . IASC improves 
over exact over the whole sequence of graphs as one does not recompute the 
eigenvectors at every iteration. Notice that the "staircase" effect in the IASC 
curves correspond the computation of the exact eigenvectors. Observe that on 
Bemol, IASC £ = 200 takes 4,956 seconds in total for eigenvector computations 
versus 31,182 for exact, a speedup factor of 6.29 for a similar cluster quality. 
The equivalent improvement is 2.26 on Citation. 



8 Discussion 

We have presented a novel incremental method for spectral graph clustering 
which updates the eigenvectors of the Laplacian in a computationally efficient 
way. Such an algorithm is useful for finding clusterings in time evolving graphs 
such as biological and social networks, and the Internet. A key part of the 
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algorithm uses the principles behind the SVD updating approach of [30] to 
derive a general way to approximate the first k eigenvectors of a perturbed 
symmetric matrix given the eigenvectors of the original matrix. The resulting 
clustering algorithm, IASC, can be easily implemented using a standard linear 
algebra library. 

We analysed IASC in both theoretical and empirical respects. Using pertur- 
bation theory, we showed when the canonical angles between the real and ap- 
proximate subspaces generated by our update algorithm are close. Furthermore, 
IASC is examined empirically relative to the computation of exact eigenvectors 
for each graph, the method of Ning et al., and the Nystrom approach. On 2 toy 
and 3 real datasets we show that IASC can often match the cluster accuracy of 
the exact approach using a small fraction of the total number of eigenvectors 
and at a much reduced computational cost. 

This work has opened up several perspectives for further study. The first 
is the analysis of the update of eigenvectors for a modularity matrix and other 
cluster quality criteria. As we have shown, the quality of the updates would 
depend on the spectrum of the matrices in question. Another interesting line of 
research is the issue of how to choose the number of clusters in the time evolving 
graphs. 
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